Y

YouLibs

Remove Touch Overlay

What's in Your Data: Data Profiler - Austin Walters, Jeremy Goodsitt | PyData Global 2021

Duration: 01:23:15Views: 489Likes: 6Date Created: Jan, 2022

Channel: PyData

Category: Science & Technology

Tags: pythonlearn to codeeducationsoftwarepydatalearncodinghow to programjuliaopensourcescientific programmingnumfocuspython 3tutorial

Description: What's in Your Data: Data Profiler - an Open Source Solution to Explain Your Data Speaker: Austin Walters, Jeremy Goodsitt Summary Data understanding is crucial for most machine learning applications. As data scientists and engineers, we need to answer these questions for every project: Is our data secure? What is in our data? How do we monitor data properties over time? The DataProfiler, an open source project from Capital One, is a Python library designed to facilitate data analysis, monitoring and sensitive data detection. Description Descriptions DataProfiler was designed to accept a wide range of data formats including csv, avro, parquet, json, text, and pandas DataFrames. Whether the data is structured, semi-structured or unstructured, the library is able to identify the schema, statistics, entities from the data. In addition, the DataProfiler provides a cutting edge pre-trained deep learning model to efficiently identify sensitive information (or PII, such as customer names, physical addresses, bank account numbers, and credit card numbers). This helps companies detect sensitive data in different data sources and formats. With the ability to interchange the data labeler, DataProfiler can be customized to help users learn what is in their data. This versatility of the data labeler allows models to be modified as needed. Running multiple models on the same dataset is easy since choosing a preexisting data labeler to train and predict takes just a few lines of code. We invite data scientists, machine learning engineers, software engineers, from beginner to expert level, to learn how to extract data properties in an efficient way with the DataProfiler. Austin Walters's Bio Machine learning engineer, manager and researcher focused on natural language process and automated machine learning. GitHub: github.com/lettergram LinkedIn: linkedin.com/in/austingwalters Website: austingwalters.com Jeremy Goodsitt's Bio Jeremy Goodsitt’s doctoral studies were at the University of Illinois where he worked on improving medical diagnostic techniques through computer vision and optimization techniques. He since joined Capital One in 2017 where he has worked on machine learning optimization infrastructure, NLP model development such as the sensitive data labeler within the DataProfiler, and the engineering design behind the open source library, DataProfiler. GitHub: github.com/JGSweets LinkedIn: linkedin.com/in/jeremy-goodsitt PyData Global 2021 Website: pydata.org/global2021 LinkedIn: linkedin.com/company/pydata-global Twitter: twitter.com/PyData pydata.org PyData is an educational program of NumFOCUS, a 501(c)3 non-profit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. PyData conferences aim to be accessible and community-driven, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cutting-edge use cases. 00:00 Welcome! 00:10 Help us add time stamps or captions to this video! See the description for details. Want to help add timestamps to our YouTube videos to help with discoverability? Find out more here: github.com/numfocus/YouTubeVideoTimestamps

Swipe Gestures On Overlay